Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment
نویسندگان
چکیده
environment List of unusual symbols used Number of pages: 26 Number of tables: 4 Number of figures: 1 List of Figures Fig. 1. Block diagram of the context-adaptive speech front-end with GMM-based clustering of the feature space in the first stage and mapping function for speech enhancement channel selection in the second stage. 22 List of Tables Table 1. Speech recognition performance (in terms of WRR) for different speech enhancement methods, for bi-gram and tri-gram language models. 23 Table 2. Number of speech utterances for which the specific speech enhancement method led to the highest word recognition rate (WRR). 24 Table 3. Accuracy of selecting the proper speech enhancement method (in percentages) for various implementations of the mapping function, for different size of the input vector V = 4, 12, 28, 60, and 124. 25 Table 4. Word recognition rates (in percentages) for various implementations of the mapping function, for different size of the input vector Abstract Based on the observation that dissimilar speech enhancement algorithms perform differently for different types of interference and noise conditions, we propose a context-adaptive speech pre-processing scheme, which performs adaptive selection of the most advantageous speech enhancement algorithm for each condition. The selection process is based on an unsupervised clustering of the acoustic feature space and a subsequent mapping function that identifies the most appropriate speech enhancement channel for each audio input, corresponding to unknown environmental conditions. Experiments performed on the MoveOn motorcycle speech and noise database validate the practical value of the proposed scheme for speech enhancement and demonstrate a significant improvement in terms of speech recognition accuracy, when compared to the one of the best performing individual speech enhancement algorithm. This is expressed as accuracy gain of 3.3% in terms of word recognition rate. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in fast-varying noise environments.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملMultiple Approaches to Robust Speech Recognition
2. ACOUSTICAL PRE-PROCESSING This paper compares several different approaches to robust speech We have found that two major factors degrading the performance of recognition. We review CMU’s ongoing research in the use of speech recognition systems using desktop microphones in normal acoustical pre-processing to achieve robust speech recognition, inoffice environments are additive noise and unkn...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملA robust RNN-based pre-classification for noisy Mandarin speech recognition
This paper addressed the problem of speech signal preclassification for robust noisy speech recognition. A novel RNN-based pre-classification scheme for noisy Mandarin speech recognition is proposed. The RNN, which is trained to be insensitive to noise-level variation, is employed to classify each input frame into the three broad classes of initial, final and pure-noise. An on-line noise tracki...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Signal Processing
دوره 91 شماره
صفحات -
تاریخ انتشار 2011